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Currently, one of the most dangerous diseases is Coronavirus disease 2019 
(COVID-19). COVID-19 is a threat to the whole world, and almost all 
countries are experiencing the COVID-19 pandemic, including Indonesia. 
Various ways to detect COVID-19 sufferers have been carried out, such as 
swab tests, rapid tests, and antigens. One way that can be done to detect 
COVID-19 infection is to look at X-ray images of the patient's lungs because 
someone infected with COVID-19 has a different lung shape from normal 
people. Many studies have been carried out to detect COVID-19, using 
either machine learning (ML) or deep learning (DL). In this study, we 
propose to use transfer learning as an extraction feature in the classification 
of the covid dataset. The study was conducted four times using four different 


Deep learning methods, namely ResNet 50, MobileNet V2, Inception V3, and DensNet- 

Transfer learning 201. After experimenting, we compared the results to find out which method 
has the best results in detecting COVID-19. From this research, it was found 
that the ResNet 50 model has the best results with 92.3% accuracy, 93% 
precision, 93% F1-Score, 99% sensitivity, and 90.7% specificity. 
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1. INTRODUCTION 

Coronavirus disease 2019 (COVID-19) is a virus that has attacked humans almost all over the 
world. The virus was first discovered in Wuhan, the Chinese part of 2019. The virus was first discovered in 
bats and then spread to humans [1]. Patients with the COVID-19 virus continue to increase from time to time 
in all countries, including Indonesia [2]. According to Dong et al. [3], in real-time detection, the number of 
COVID-19 cases in Indonesia until January 2020 reached 788,402 cases, of which 653,000 patients were 
declared cured and 23,296 people were declared dead. This number is still increasing every day in all cities in 
Indonesia. Therefore, serious treatment and prevention are needed to deal with this virus. Detecting COVID- 
19 is very challenging because this virus is still new and there are few datasets that can be used. There are 
already several diagnostic methods that can be used to detect COVID-19, such as a rapid antigen or antibody 
tests, immunoenzymatic serological tests and molecular tests based on RT-PCR [4]. Another method that can 
be used to detect this virus can also be done by performing X-ray scan on the patient's lungs and comparing 
them with healthy human lungs. These techniques can contribute to the screening and successful monitoring 
of diagnosed cases [5]-[7]. 

Large quantities of research related to the detection of COVID-19 has been carried out before, such 
as that conducted by He et al. [8] who used the deep learning method, He managed to create a program to 


Journal homepage: http://beei.org 


1092 O ISSN: 2302-9285 


diagnose COVID-19 sufferers from CT scan data by obtaining an F1 score of 85% and an AUC of 94%. This 
result is quite good because it uses a small dataset, only 349 CT images from 216 patients. Another study 
conducted by Henderi et al. [9] used the forward chaining method, the study aimed to create a model of a 
decision support system in diagnosing patients exposed to COVID-19. The way it works is that people who 
are positive for COVID-19 will be placed in a room to be observed for their health, and then if the data is 
collected and then it will be used to make it easier to make a decision that someone has COVID-19 or not. 
Subsequent research was carried out by Yang et al. [10], who used the CT-scan method. This research 
method of working is to create a program to diagnose COVID-19 sufferers from CT scan data by obtaining 
F1 results of 90%, AUC of 98%, and accuracy of 89. These results are quite good because they use a small 
dataset, only 349 CT images from 216 patients. Another study was conducted by Altan and Karasu [11] who 
used the X-ray scan method. This study was successful in developing a program that was evaluated on 1596 
chest X-ray images, and the findings demonstrate that the model can accurately distinguish COVID-19, 
normal pneumonia, and viruses. 

Transfer learning is a method that has been proven to be effective in classifying image data in many 
fields. For example, in the tourism field to explore tourists' urban images based on geotagged photos [12]. In 
addition, transfer learning is also used in agricultural and applied economic fields [13], and this method also 
can be used to predict wind speed [14]. From these powerful transfer learning methods, this study aims is to 
utilize several transfer learning methods in the process of classifying lung x-ray image data. Transfer 
Learning is the notion of using layers in a pre-trained model for comparable domain scenarios [15]. 

Some researchers who try to solve medical problems using transfer learning algorithms based on 
image data. A paper shows that Inception V3 is the best algorithm compared to deep convolutional neural 
network (DCNN) to classify pulmonary classification images [16]. Lung CT-scan images were also used in a 
study that compared some transfer learning algorithms, and the conclusion shows that DenseNet201 is the 
best algorithm [17]. MobileNet and classical CNN are compared in research to classify thoracic (pulmonary) 
diseases in x-ray images, and the result shows that improved MobileNet outperforms the basic CNN [18]. 
The research entitled “Deep learning for diagnosis of COVID-19 using 3D CT scans” proposed the Resnet- 
50 model combined with majority voting. The results reveal that the presented Resnet-50 model outperforms 
all other models and fusing approaches when used in conjunction with majority voting [19]. 

We conducted several experiments using four transfer learning methods, namely Inception V3, 
ResNets, MobileNets, and DenseNet 201. Then the results of the classification will be compared to find out 
which transfer learning method has the best performance in classifying x-ray image data. The following is a 
breakdown of the paper's structure. In section 2, we give a brief overview of the dataset, data pre-processing, 
CNN classifiers, and various pre-training approaches. The experiment's findings and commentary are 
presented in section 3. Finally, we make some conclusions from the tests in section 4. 


2. MATERIAL AND METHODS 

This paper describes the classification of lung x-ray image data to identify the COVID-19 virus. We 
conducted several experiments using 4 CNN architectures such as ResNetV50, MobileNetV2, InceptionV3 
and DensNet-201. The architecture used consists of pre-processing data, feature extraction, classification, and 
evaluation of the model obtained by using a confusion matrix. Measurement of model performance is 
measured by accuracy, precision, Fl-score, sensitivity, and specificity. 


2.1. Dataset 

This study was conducted using a dataset of lung x-ray images collected from several sources. The 
dataset can be downloaded for free at Kaggle [20], [21]. From the dataset, we only took two classes, namely 
normal class, and covid class, totaling 13.808 images consisting of 10.192 normal data images and 3.616 
covid data images. More clearly, the dataset can be seen in Table 1, an example of COVID-19 x-ray image 
has been illustrated in Figure 1(a) and normal x-ray image has been illustrated Figure 1(b). 


2.2. Image augmentation 

The first stage in augmentation is to make the image smaller, so that the classification process will be 
faster. The original image size in each dataset is 299x299 pixels in PNG format, and the image is then resized 
to 224x224 pixels, adequate image quality is essential for optimal classification result, hence noise removal is 
a prerequisite [22]. We did some augmentation on the training data, namely the scale 1/255, horizontal flip 
and vertical flip. Augmentation serves to increase the number of x-ray images of the lungs by flipping the 
image horizontally and vertically. The augmentation process is only carried out on training data and is not 
carried out on testing data and validation data. The block diagram of this process is shown in Figure 2. After 
that, the image will be divided into three parts, namely training, testing, and validation, when performing 
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learning and inference experiments, the train/test Split approach is used to validate the experiment [23]. For 


the distribution of images, a comparison of 8:1:1 is made, so the number of training is 11.047 images, testing 
data is 1.380 images, and validation data is 1.380 images. 


Table 1. Lung x-ray dataset 


No Class Images 
1 Normal 10.192 
2 Covid 3.616 


Total 13.808 


(a) (b) 


Figure |. Lung x-ray images (a) Covid and (b) normal 


Research block diagram 
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Figure 2. Research block diagram 


Evaluation 


2.3. Proposed model 

Transfer learning is a technique that can be used to overcome the problem of using deep learning 
methods in scanty data. Transfer learning is a CNN model that has been trained on a very large ImageNet 
dataset, containing 10 different classes with 70000 images. The use of transfer learning can reduce training 
time because we do not need training models from scratch. In the training process, we will modify the 
architecture by freezing several layers of model pretrained and changing the output neurons according to our 
needs [24]. Figure 3 shows our proposes transfer learning architecture in detail. We have 3 main phases in 
this research such as transfer learning configuration, adaptation phase, and fine tuning. The first phase is we 
need to configure the parameter of the algorithm such as set dense layer 2, using global average pooling 2D, 
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and using sigmoid as activation function. The next step is adaptation phase, in this session we need to 
configure the learning rate to 5e-5, initial epoch 10, and then save the model. The final step is fine tuning, in 
this phase, we need to grab the model checkpoint, then compile the model, and the last step is re-save the 
model. 


X- 
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S op 
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Pooling2D Dense layer=2 Activation: sigmoid | T 5 
E g 
= £ 
a 
g 
Learning rate=Se-5 > Initial epoch=10 | Save model | g 
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© 
< 
Model checkpoint E Model compile | 


Figure 3. Transfer learning architecture detail 


2.4. Convolutional neural network 

The convolutional neural network (CNN) is a well-known deep learning concept inspired by a living 
creature's inherent visual perception mechanism. CNNs have surpassed all other machine learning 
approaches for visual object detection. The CNN is a well-known deep learning concept that is based on the 
innate visual perception process of living creatures. CNNs have emerged as the most popular machine 
learning technique for detecting visual objects. Convolutional, pooling, and fully linked layers are the three 
essential components of CNN. The convolutional layers are responsible for learning input feature 
representations. This layer also had a large number of convolutional kernels for computing various feature 
maps [25]. Each neuron in a feature map is linked to a more detailed section of a field in the layer before it, 
the basic architecture of CNN has been illustrated in Figure 4. The new feature map may be generated by first 
convoluting the input with a learning kernel and then applying an element-wise nonlinear activation function 
to the convolved outputs [26]. 


= ReLu+ ReLu+ 
Dataset Max Pooling Max Pooling 
Convolutional Convolutional Convolutional Fully Output 
Layer 1 Layer 2 Layer 3 connected 


Figure 4. The basic architecture of CNN 


Starting with the input signal x, CNNs have a hierarchical architecture, with each successive layer xj 
defined by: 


xp" j-1 (1) 
where Wj is a linear operator in the convolution layer, and p is a rectifier max (x, 0) or sigmoid 1⁄ + exp =x). 
2.5. Inception V3 

The enhanced usage of computing resources within the network is the fundamental feature of this 


architecture. We enhanced the network's depth and width while keeping the computational budget equal due 
to a well-thought-out architecture, the illustration is shown in Figure 5. The architectural selections were 


Bulletin of Electr Eng & Inf, Vol. 11, No. 2, April 2022: 1091-1099 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 O 1095 


based on the Hebbian principle and the understanding of multi-scale processing to enhance the quality [27]. 
This deep CNN architecture codename is Inception. 


Filter 


concatenation 


3x3 convolutions 


1x1 convolutions | 


| 5x5 convolutions | | 3x3 max pooling 


Previous layer 


Figure 5. Inception module for naive version 


The Inception architecture's core idea is to analyze how the optimum local sparse structure of a 
convolutional vision network may be approximated and covered by conveniently available dense 
components. Next is discovering the best local structure and repeat it spatially. Each prior layer unit 
corresponds to a certain region of the input picture, and these units are combined together to form filter 
banks. Correlated units would cluster in tiny areas in the lower levels (those closest to the input). As a result, 
many clusters would be concentrated in a single region, which could be covered by a layer of 1x1 
convolutions in the next layer. However, convolutions spanning bigger patches will be able to cover a smaller 
number of more widely separated clusters, and the number of patches across larger and larger regions will 
drop. To minimize patch-alignment issues, current versions of the inception architecture are limited to filter 
sizes of 1x1, 3x3, and 5x5 [28]. When compared to shallower and narrower designs, the main advantage of 
this technique is a large quality gain at a moderate increase in processing needs. 


2.6. ResNets 

It is harder to train deeper neural networks. The researcher provides a residual learning approach for 
training networks that are much deeper than previously utilized networks. The layers are purposely 
reformulated as learning residual functions with reference to the layer inputs rather than learning 
unreferenced functions by the researcher. The illustration is shown in Figure 6. 

A(x) means desired underlying mapping. Let stacked nonlinear layers fit another mapping 
F(x):=H(x)-x; then the original mapping is recast into F(x)+x. These residual networks are easier to optimize 
and can result in a higher depth to improve accuracy [29]. They build networks by stacking residual blocks 
on top of each other: a ResNet-50, for example, has fifty layers made up of these pieces. In this research, we 
use ResNet-50 as one of the transfer learning architecture. 


weight layer 


x 
identity 


Figure 6. ResNet residual learning is a crucial component of the learning process 


2.7. MobileNets 

MobileNets are based on a simplified architecture that uses depth-wise separable convolutions to 
create lightweight deep neural networks. There are two essentials to take into account. The usage of global 
hyperparameters to optimize latency and accuracy tradeoffs is described. These hyper-parameters let the 
model builder choose the best model size for their application based on the issue limitations. 

MobileNet employs 3x3 depthwise separable convolutions, which require 8 to 9 times less 
processing than traditional convolutions while resulting in just a minor loss of accuracy. The researchers 
investigated some of the key design decisions that led to a successful model [30]. Then showed how to use a 
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width multiplier and a resolution multiplier to create smaller and quicker MobileNets by trading off a 
respectable amount of accuracy for size and latency reduction. 


2.8. DenseNet 201 

The dense convolutional network (DenseNet) is a feed-forward network in which each layer is 
linked to the layers above it. DenseNets show a steady increase in accuracy as the number of parameters 
increases, with no evidence of performance degradation or overfitting. It achieved cutting-edge performance 
across a wide range of highly competitive datasets in several circumstances. Shorter connections may let 
individual layers get more supervision from the loss function, which might explain why DenseNet are more 
accurate [31], [32]. DenseNets recycle features to take use of the network's capabilities, resulting in 
condensed models that are easy to train and parameter efficient, rather than extracting representational power 
from unusually deep or broad architectures. The identity function promotes efficiency and variation in the 
input of subsequent levels by concatenating feature maps learnt by different layers. 


2.9. Training and testing 

This research was conducted using the python programming language, with several special libraries 
used to run deep learning models. The research was carried out on Google's cloud computing which has a 
GPU, so the time needed to create a learning model can be faster. We use a sigmoid activation function at the 
output layer, then a loss function using binary cross-entropy, optimization using Adam, and a learning rate of 
Se-5 and epoch 10. In the augmentation process there are several stages, namely, scaling 1/255, horizontal 
flip and vertical flip, we only did the horizontal flip and vertical flip process on training data, not on testing 
data and validation data. Then feature extraction process, we use pre-training that has been previously trained 
on big data from ImageNet, and the pre-training is InceptionV3, ResNet50, MobileNetV2, and DensNet-201. 
The use of pre-training can facilitate research because we do not need to train the feature extraction, this 
technique is called transfer learning. Modification of the transfer learning architecture that we need to do is to 
change the head of the architecture according to the needs of the dataset that we are using. In this case, we 
use two neurons at the output because the result of the desired classification has two polarities. The training 
curves at the InceptionV3 model has been illustrated in Figure 7(a), the MobileNetV2 model has been 
illustrated in Figure 7(b), the ResNet-50 model has been illustrated in Figure 7(c) and the DensNet-201 
model has been illustrated in Figure 7(d). 


Training and Validation Accuracy (InceptionV3) Training and Validation Accuracy (MobileNetV2) 


Accuracy 
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Figure 7. Training and validation (a) inceptionv3, (b) MobileNet V2, (c) ResNet-50, and (d) DensNet-201 
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3. EXPERIMENTAL RESULTS AND DISCUSSION 

The testing phase is carried out using python as a programming language and google cloud 
computing because it has fast performance. The study was conducted using a dataset of 13.808, consisting of 
3.6816 data infected with covid and 10.192 normal data. Even though the dataset is very imbalanced, by 
using a transfer learning approach, quite good results can be obtained. The complete results of the experiment 
training are shown in Table 2. 

Table 2 shows the classification result of Lung x-ray images (normal or covid). The model that has 
the best accuracy is obtained by the ResNet 50 model, which obtains an accuracy of 92.3%. Otherwise, the 
model that has the least accuracy is Inception V3 88.6%. Then the model that has the best precision is ResNet 
50 with a value of 93%, while the model with the worst precision is Inception V3 with 90%. The models with 
the best sensitivity values are ResNet 50 and DenseNet 201, with a value of 99%. Then the model that has the 
best specificity value is ResNet 50 with a score of 90.7%. Thus, it can be concluded that the best model that 
can be used for the classification of the covid x-ray image data is the ResNet 50 model. In a previous study 
Musleh and Maghari [33] using the CheXNet algorithm obtained an accuracy of 89.7%, compared to the 
research conducted with an accuracy of 92.3%, in addition Wang et al. [34] using the Covid-Net approach to 
detect covid in x-ray images obtained a sensitivity of 80%, compared to this study it obtained a sensitivity of 
99%. 


Table 2. Classification result 


No Model Accuracy Precision F1-Score Sensitivity Specificity 
1 Inception V3 88.6 90 87 98.1 86.8 

2 MobileNet V2 92.2 92 92 95 91 

3 ResNet 50 92.3 93 93 99 90.7 

4 DenseNet 201 91.6 92 91 99 89.8 


4. CONCLUSION 

Due to the large amount of data used and very imbalanced data, it is challenging work to diagnose 
COVID-19 disease. In this study, we propose several methods of transfer learning to be used for data 
classification. Transfer learning method can provide good performance when used to classify medical image 
data, even though the CNN architecture uses data from the ImageNet dataset which has little medical data. 
Using transfer learning we do not need to build a model architecture from scratch, just change the output 
layers. Our model successfully classifies two classes of lung x-ray data, and the ResNet 50 model has the best 
results with 92.3% accuracy, 93% precision, 93% Fl-score, 99% sensitivity, and 90.7% specificity. The 
research carried out has good results, we hope this research can help in detecting the COVID-19 disease 
quickly. 
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